When we read the PHP 7.4 release announcement, we can see that it brings many exciting new features, such as typed properties, preloading, and FFI – foreign function interface. Some of these features have been long-awaited. For instance, typed properties, a few of them were quite surprising. Regardless of this, one thing is clear, from all mentioned features, “FFI – foreign function interface” is the least self-explanatory.
So, what is FFI? According to Wikipedia, it is: “a mechanism by which a program written in one programming language can call routines or make use of services written in another.”[1]. But this is merely a book definition, which does not tell us what real-world problems can be solved. This left us with a more proper question: What problems does FFI solve?
FFI in PHP
FFI can help you if you want to reuse a code that is already written in an entirely different programming language. For example, you have a connector to a database that has not yet been ported to PHP. Another possible usage is to speed up some parts of your code. In this case, you can write an algorithm in the programming language that can provide a better performance for your use; for instance, when you want to do some complicated scientific calculations. Finally, you can do something that is not supported in your language. To illustrate this, you can communicate with hardware.
You may ask yourself, wait, can we not already do all of this with PHP extensions? The answer is yes. We can do all of the mentioned tasks with the help of PHP extensions. But FFI still has its merits as it promises more straightforward usage, easier maintenance or deployment, and better portability across PHP versions. The main reason for this is that you can do everything in plain PHP, so there is no need to set a compilation toolkit or change deployment procedures.
Regarding portability, the PHP extension API can vary across versions, which usually leads to a need to recompile the whole extension. Another possible benefit is that there is no need to learn any new programming language. We are still in PHP, right? Well, this is more complicated, as we will see later.
There are a few limitations of FFI in comparison with extensions. You cannot easily modify how PHP works internally, so there is no easy way to create debug tools such as Xdebug. Also, PHP extensions are usually faster as they are made in compiled languages. The last limitation is that FFI supports only one target language; often, this language is C.
FFI is implemented in PHP 7.4 through a brand new extension that is part of the core and always available. It is also more mature than previous attempts on FFI in the PHP ecosystem and multiplatform, at least to some degree. The main caveat is that the extension allows the programmer to call only C libraries or with libraries providing so called “C ABI”.
IPC NEWSLETTER
All news about PHP and web development
First steps
We will illustrate basic usage of FFI by rewriting abs() function, which returns an absolute value of provided number:
abs.php
<?php $ffi = FFI::cdef( 'int abs(int j);', // function declaration in C language 'libc.so.6' // library from which the function will be called ); var_dump($ffi->abs(-42)); // int(42)
The first thing we have to do is to create a proxy object between the library and PHP. One of the biggest challenges in FFI, regardless of programming language, is how to map functions from one programming language in another one. Authors of FFI in PHP decided that the best way to handle this is by parsing a C function definition. So it is no surprise that the first argument contains a function declaration in C language. The second parameter must be the name of the library from which the function will be called. In this case, we are trying to use functions from the standard C library.
Code samples
All provided code samples require minimum version of PHP 7.4 and operating system Linux or Windows with WSL. They cannot be run in standard Windows or Mac OS. The complete source code is available at https://github.com/kambo-1st/php-ffi-article together with prepared Docker image.
The function signatures are quite similar in PHP and in C. After all, PHP is a C-style language. Nevertheless, there are a few differences:
- Variable types must always be defined.
- The return type is declared before the function name.
The function is then called through a method of the same name on the proxy object. Simple data types such as integer, float, and boolean are automatically converted during the function call. Automatic conversion can sometimes be a double-edged sword, especially for the PHP developers, as we will illustrate on the following snippet:
abs-data-type.php
<?php $ffi = FFI::cdef( 'int abs(int j); long int labs(long int j);', 'libc.so.6' ); var_dump($ffi->abs(-2147483649)); // int(2147483647) var_dump($ffi->labs(-2147483649)); // int(2147483649)
In this snippet, we first try to call the abs() function with a value outside of the C integer range. Thanks to that, the result value will overflow, and the function returns the wrong result. PHP counterpart of the abs() function will handle float and integer value without issue. If we want to get a proper number, we have to call labs() that will work with values outside of C integer type. Data types in C are much richer than in PHP, and this must be considered during the function call.
Complex data types
Unfortunately, complex data types such as arrays cannot be handled straightforwardly. We will demonstrate this on the example of a Quicksort algorithm from the standard C library:
C standard library qsort function signature (shortened code snippet)
void qsort ( void *array, // void pointer to array size_t array_size, // size of array size_t data_type_size, // size of array type int (*comp) (const void *a, const void *b) // function pointer to comparison function );
As we can see, the method’s signature contains a type that does not have a direct PHP counterpart. At this moment, we are interested in the first parameter, which represents an array that should be sorted. Definition void * represents a void pointer to an array that should be sorted. If we try to pass array directly, an exception will be thrown:
pass-php-array.php
<?php $ffi = FFI::cdef( "void qsort (void *array, size_t count, size_t size, int (*comp)(const void *a, const void *b));", "libc.so.6" ); $array = [3,1,2]; $ffi->qsort( $array, count($array), FFI::sizeof(FFI::type("int")), $cmp ); // Fatal error: Uncaught FFI\Exception: Passing incompatible argument 1 of C function 'qsort', expecting 'void*', found PHP 'array'
Instead of this we must pass an array pointer, but we should first discuss what the pointer is. The basic definition is rather simple – it is a variable that stores another variable’s memory address in computer memory. The most similar thing in PHP is a reference. But be aware that we cannot equate references and pointers. Actually, this is mentioned several times in PHP documentation! [2] Most complex types in C such as array and strings, are passed or returned as pointers.
YOU LOVE PHP?
Explore the PHP Core Track
PHP FFI has some support functions for working with pointers (e.g.: FFI::addr()). But all of these functions work only with special native C data types. Fortunately, these types can be easily created with the method FFI::new().
This method is rather flexible and allows us to create virtually all complex data types. We will illustrate its usage on creating a simple array:
simple-array.php
<?php $p1 = FFI::new("int[2]"); // array of two integers $p1[0] = 123; var_dump(count($p1)); // int(2) var_dump($p1[0]); // int(123)
Method new() is expecting a C definition of the data type. Arrays in C are not as flexible as their PHP counterparts, namely:
- They can hold only the specified data type, e.g.: integer.
- Their size is fixed and must be determined during its creation.
- They allow only integer indexes.
Besides that, we can treat them like a common PHP array, e.g.: their elements can be accessed as regular PHP array elements. They can be used in the foreach loop or as an argument of count() function. If we try to access element outside array range an exception will be thrown.
Finally we will pass the array to the method FFI::addr(), which return a wrap object representing pointer, that we can pass to qsort() function:
quick-sort.php (shortened code snippet)
<?php $array = FFI::new("int[3]"); $array[0] = 2; $array[1] = 3; $array[2] = 1; $ffi->qsort( FFI::addr($array), count($array), FFI::sizeof(FFI::type("int")), $cmp );
Another common complex data type is a string. Strings are usually internally implemented as arrays of individual characters, so it is not surprising that they are similarly treated in C. But there are few differences on the side of PHP FFI. As an example, we will use function strtok(), a direct counterpart to the PHP strtok() function:
string.php
<?php $ffi = FFI::cdef( 'char *strtok(char *str, const char *delim);', 'libc.so.6' ); $delimiter = '-'; $token = $ffi->strtok('foo-bar-baz', $delimiter); var_dump($token); echo FFI::string($token); // foo
Data type conversion, in this case, is partial and can be thus quite confusing. We can see that the input string “foo-bar-baz” has been converted automatically without further action. But return type deserves more attention, as we can see after the execution:
object(FFI\CData:char*)#2 (1) { [0]=> string(1) "f" } foo
We cannot access the returned value directly. It must be converted with the help of function FFI::string(). Again conversion is just partial.
The last data type which we will address is structure. It is another data type that does not have a direct counterpart in PHP. But we can imagine it as an object without any methods, just with properties.
Example of C structure (shortened code snippet)
struct Book { char title[50]; int book_id; }; struct Book Book1; Book1.book_id = 42;
As C is not an object-oriented language, structures are quite common. Each structure must be defined before its usage and the definition itself should be part of cdef. This example will reimplement function mktime() that get a UNIX timestamp from the provided date defined as a structure tm:
mktime.php (shortened code snippet)
<?php $ffi = FFI::cdef( "struct tm { int tm_sec; int tm_min; int tm_hour; // Rest omitted for the sake of clarity }; unsigned int mktime(struct tm * );", "libc.so.6" ); $tm = $ffi->new("struct tm"); $tm->tm_mday = 1; // days starts by 1 $tm->tm_mon = 0; // months starts by 0 (January) $tm->tm_year = 118; // years since 1900 var_dump($ffi->mktime(FFI::addr($tm))); // int(1514761200)
Instances of data structures can be created by the method new(), but this time the method should be called on an FFI object which contains a definition of the structure. If we do not do this an exception will be thrown.
mktime.php (shortened code snippet)
$tm = $ffi->new("struct tm"); $tm->tm_mday = 1;
Structure variables can be accessed in the same way as an object property in PHP.
Memory management
Probably the most complicated area of PHP FFI is the different memory management model in PHP and C. The situation is more complicated because it is usually not visible at first glance. Disclaimer: this section of the article does not have the ambition to provide a complete description of the C memory model as this is out of scope of this article.
PHP is a memory-safe language with a garbage collector. We do not need to know how the memory is allocated and how it will be freed. The only matter of concern is usually just the memory limit defined in PHP configuration. This is simply not true for C, as we must deal with:
- manual memory management,
- pointers
- and direct memory manipulation.
Because FFI is just a bridge between PHP and C, we will also have to deal at least with some aspects of the C memory model. This is especially true for passing or returning arrays/strings to C functions. PHP FFI is trying, to some degree, to manage memory for us. For example, all complex data types created by FFI new() method are by default managed by PHP (in the FFI terminology, they are ‘owned’). So, we do not have to care about their life cycle.
Security of PHP FFI
Usage of PHP FFI can lead to huge security risks as it allows virtually the same things as C. We can, for example, interface with the system on a very low level or even disable some PHP security measures by tempering the memory of the PHP interpreter through direct memory manipulation capability. As a security countermeasure, FFI is, by default, allowed only in the CLI or during script preloading. This can be changed in ffi.enable php.ini directive.
But this is sometimes not enough, and we have to be especially cautious in the following areas:
- Pointers obtained by the method FFI::addr(),
- individual elements of C arrays and structures
- and most data structures returned by C functions.
C Pointers are always non-owned. This can very easily lead to situations where the so-called “dangling pointer” is created. A dangling pointer is a pointer that does not point to a valid variable. This can happen, when the referenced variable is destroyed, but the pointer still points to its memory location. We will illustrate this on the following example:
dangling-pointer.php (shortened code snippet)
function dateTimeDefinition($ffi) { $time = $ffi->new("struct tm"); $time->tm_sec = 10; return FFI::addr($time); } $time = dateTimeDefinition($ffi); var_dump($time); // Print random rubbish value or crash script
The issue here is that PHP removed structure in variable $time after the function dateTimeDefinition() ends its execution. However, this will lead to the latter problems as we try to access the pointer pointing to this variable. What is even more problematic in this case is the reaction of PHP. Execution of the script will not be terminated and the variable $time will contain some random data.
Of course, this will not happen in pure PHP code, because PHP tracks which variables can be accessed and cannot be removed from memory. FFI::addr() method will not inform the PHP interpreter that the variable is still accessible.
Another surprising situation can occur during access of individual elements in C arrays and structures. They are also “not-owned”, again we will illustrate this on following example:
dangling-pointer-arrays.php
<?php $parent = FFI::new("int[2][2]"); // $parent is owned pointer $child = $parent[0]; // $child is not-owned part of $parent unset($parent); // $parent is deallocated ($child became dangling pointer) var_dump($child); // crash because dereferencing of dangling pointer
The most complicated topic is probably handling complex data types returned by C functions. In C, values can be returned in two ways:
- by value,
- as a pointer.
As we already mentioned – Arrays/strings in C are always passed/returned as pointers. When the arrays are in the game, the most crucial question is: Who allocates and deallocates memory? Usually, it is PHP FFI, but sometimes it is the called function. As usual, it will be best to show an example. In this case, we will reimplement the PHP function glob(). This function finds pathnames that match a specific pattern:
glob.php
$ffi = FFI::cdef( "typedef struct { unsigned int gl_pathc; // number of found items char **gl_pathv; // contains found files/folders unsigned int gl_offs; } glob_t; int glob(const char *__restrict, int, int (*)(const char *, int), glob_t *__restrict); void globfree(glob_t *);", "libc.so.6" ); $result = $ffi->new("glob_t"); $ffi->glob('*', null, null, FFI::addr($result)); for ($i = 0; $i < $result->gl_pathc; $i++) { echo FFI::string($result->gl_pathv[$i]).PHP_EOL; // found file name } $ffi->globfree(FFI::addr($result));
This example is, to some extent, similar to the previous ones. We create a structure glob_t and pass it as a pointer to glob() function. So far so good. After execution of the glob() function, structure will contain the number of found items (gl_pathc) and items itself (gl_pathv). We can then retrieve found items by traversing the returned array. We must be cautios – inside variable $result->gl_pathv will not be a regular PHP array. It will be an instance of wrapper object FFI\CData with result data. We can access them as individual elements of array eg.: $result->gl_pathv[0], but we should not exceed the array range. If we do this, we are risking segmentation failure and swift termination of script execution.
What is also new is calling the function globfree(). This function will clean up the memory allocated by the glob() function. Why do we have to do this? We do not know how many items function will find and thus PHP cannot allocate memory ahead. It is the responsibility of the glob() function to allocate memory in the array gl_pathv. Function globfree() is the only way to free this memory reliably as is provided by the authors of the glob() function. Information about how the memory is handled is usually described in function documentation.
Call PHP from C
PHP FFI conveniently allows calling a PHP code from the C. This is mainly useful for libraries, which expect user logic in form of callback functions. The most common examples can be GUI toolkits such as GTK. Usage of these toolkits is usually based on executing callback functions bound to form elements (buttons, text areas, etc.). The callback mechanism in C is implemented with the help of a function pointer. Let us go back to the previously mentioned qsort() definition:
C standard library qsort function signature (shortened code snippet)
void qsort ( void *array, // void pointer to array size_t array_size, // size of array size_t data_type_size, // size of array type int (*comp) (const void *a, const void *b) // function pointer to comparison function );
The last function parameter represents a function pointer to a comparison function. The signature of the function can be deduced from the parameter definition. In this case, callback expects two parameters a and b, defined by type const void *. Type const void * is a pointer to any data type, keyword const marks that we should not modify the provided value. An integer value is expected as a callback result.
Main reason for the existence of this comparison function is the universality of the qsort() function. It must be able to sort arrays of any provided type. This is also the reason behind the existence of third parameter data_type_size. Similar code in plain PHP will look like this:
usort.php
<?php $cmp = function ($a, $b) { if ($a == $b) { return 0; } return ($a < $b) ? -1 : 1; }; $array = [2,3,1]; usort($array, $cmp); var_export($array);
But back to qsort() function. Comparison function should be implemented as regular PHP closure and passed as parameter to qsort() function as we can see in following code:
quick-sort.php (full listing)
<?php $ffi = FFI::cdef( "void qsort (void *array, size_t count, size_t size, int (*comp) (const void *a, const void *b));", "libc.so.6" ); $array = FFI::new("int[3]"); $array[0] = 2; $array[1] = 3; $array[2] = 1; $cmp = function (FFI\CData $a, FFI\CData $b) { $aInt = FFI::cast("int", $a)->cdata; $bInt = FFI::cast("int", $b)->cdata; if ($aInt === $bInt) { return 0;} return ($aInt < $bInt) ? -1 : 1; }; $ffi->qsort(FFI::addr($array), count($array), FFI::sizeof(FFI::type("int")), $cmp); var_dump($array);
Unfortunately, callbacks are not without issues as they are not supported on all platforms, inefficient, and leaks resources. Especially resource leak is a grave issue as this means that after a significant number of callback calls, PHP will exhaust all available process memory. This makes callbacks virtually unusable for long-running applications.
PHP FFI re-evaluation
PHP FFI can be a powerful tool when we want to interact with code written in C. But this is not always without any significant challenges. As we can see multiple times in the article, knowledge of C and especially its memory model is a basic need. What complicates the matter more are striking similarities between C and PHP.
IPC NEWSLETTER
All news about PHP and web development
Developers can easily assume that things will work the same way in both languages. This is further enhanced by similar function names in standard libraries of both languages and the same language constructs. It can be surprising to a PHP programmer how unforgiving C is. You can shoot yourself in the foot in many creative ways. But more importantly, the use of FFI and especially the FFI pointers can fundamentally change how the PHP is working and can lead to some wildly unpredictable situations or in severe memory leaks. Ignoring rules also very often ends in crashes of PHP interpreter.
The usage of FFI, as a way of obtaining an instant performance boost, should be carefully considered. There is always a price that must be paid during the call of function or during the creation of C data structure, and generally it makes sense to call a bigger chunk of C code. [3] Also, security should be considered from the beginning. Bad sanitation of user input or allowing to use FFI directly can be even more dangerous than exposing an attack vendor in the form of eval().
But FFI has its advantages. It is a straightforward way to prototype things and allows us, at least partially, to see how the world functions outside the PHP ecosystem. In plain words: It is fun until someone gets hurt!
Links & Literature
[1] https://en.wikipedia.org/wiki/Foreign_function_interface
[2] https://www.php.net/manual/en/language.references.arent.php